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FIELD OF THE INVENTION 

The present invention relates to the introduction of visual content into Karaoke 
applications, beyond a standard background video or the song text, and in particular it 
relates to the introduction of such into broadcast Karaoke applications (whether by 
radio, cable, the internet or otherwise). 

BACKGROUND AND PRIOR ART 

In typical Karaoke applications, the Karaoke text is displayed at the bottom of the 
screen to assist the viewer to sing along with the music or song. The Karaoke text is 
encoded together with the background video. Such an approach uses up significant 
transmission bandwidth and is not commercially viable in broadcast applications. 

It is known to save transmission bandwidth by transmitting one background video for 
use with multiple Karaoke songs. In this approach, many Karaoke audio elementary 
streams and their associated Karaoke text elementary streams containing the text 
and scrolling information are broadcast at the same time, and each of the songs uses 
the same video content. As a result, the user has more choices for Karaoke songs 
without increasing transmission bandwidth significantly. Broadcasters also benefit by 
inserting advertisements onto the video, as well as by providing value added Karaoke 
applications to the viewer. However, any such other textual content or advertisements 
to be displayed are encoded onto the video. 

One aspect of Karaoke is that the requirement for singing is not constant throughout 
a song. There may be a prelude or introduction where no singing is required, 
significant portions in the middle where no singing is required and a postlude where 
no singing is required. These are times when the singer is doing nothing and they are 
recognised as periods in which other things can be done instead. 



35 



For instance, according to patent publication JP2001-350482A Karaoke data can 
include time interval information indicating time bands of non-singing intervals. For a 
performance, this information is compared with presentation time information relating 
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to a spot programme. The spot programme whose presentation time is closest to the 
non-singing interval time is displayed during that non-singing interval. 

Patent publication JP7-271387 describes a recording medium, which not only records 
5 song audio and text information, but also text picture information corresponding to a 
text picture other than the song text, which is to be displayed. This can be used to 
avoid a situation in which a singer merely listens to the music and waits for the next 
step during a musical prelude or interlude. 

10 Japanese patent publication No. JP1 0-1 24071 describes a hard disk drive provided 
with a music data storage part which stores music data on pieces of karaoke music 
and a music information database which stores information regarding albums 
containing these pieces of music. In the music data, a flag is provided showing 
whether or not the music is one contained in an album. A controller determines if a 

15 song is one for which the album information is available. During an interval for a song 
where the information is available, data on the album name and music are displayed 
as a still picture. 

Japanese patent publication No. JP10-268880 describes a system to reduce the 
20 memory capacity needed to store respective image data, by displaying still picture 
data and moving picture data together according to specific reference data. Genre 
data in the header part of Karaoke music performance data is used to refer to a still 
image data table to select pieces of still image data to be displayed during the 
introduction, interlude and postlude of the song. The genre data is also used to refer 
25 to a moving image data table to select and display moving image data at times 
corresponding to text data. 

The aim of the present invention is to enable the improved insertion of textual and 
interactive contents for use in Karaoke applications, for instance during non-singing 
30 intervals. Ideally, it is intended to enable a commercially viable scheme that fits digital 
TV broadcasting standards. 



SUMMARY OF THE INVENTION 
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According to one aspect of the present invention, there is provided a method of 
encoding Karaoke applications comprising: 

encoding a background video signal for use with one or more Karaoke songs; 

encoding one or more Karaoke songs; 
5 encoding Karaoke song texts associated with said one or more songs, to be 

displayed in a karaoke text display; and 

encoding visual contents for display outside the Karaoke text display during 
playing of said one or more Karaoke songs, as private section data. 

10 According to another aspect of the present invention, there is provided a method of 
providing audio and video Karaoke signals comprising the steps of: 

receiving Karaoke applications encoded according to the above method; 
decoding said encoded background video signal; 

decoding said encoded one or more Karaoke songs to provide an audio signal; 
15 decoding the encoded one or more Karaoke song texts associated with the one or 

more decoded songs; 

decoding the encoded visual contents; and 

combining said background video signal, said one or more Karaoke song texts 
and said visual contents to form a video signal, with the one or more Karaoke song 
20 texts in a karaoke text display and said visual contents in a region outside the karaoke 
text display during some or ail of the one or more songs. 

According to a third aspect of the present invention, there is provided apparatus for 
supplying Karaoke applications comprising: 
25 video encoding means for encoding a background video signal for use with 

multiple Karaoke songs; 

song encoding means for encoding Karaoke songs; 

text encoding means for encoding Karaoke song texts associated with said songs, 
for display in a karaoke text display; and 
30 visual contents encoding means for encoding visual contents for display outside 

the Karaoke text display during playing of said Karaoke songs, as private section 
data. 

According to again another aspect of the present invention, there is provided 
35 apparatus for providing audio and video Karaoke signals comprising: 
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receiving means for receiving Karaoke applications encoded according to the 
method of the first aspect or by the apparatus of the third aspect; 

video decoding means for decoding the encoded background video signal; 
song decoding means for decoding the encoded Karaoke songs to provide an 
5 audio signal; 

text decoding means for decoding encoded Karaoke song texts associated with 
said decoded songs; 

visual content decoding means for decoding the encoded visual contents; and 
combining means for combining said background video signal, said one or more 
10 song texts and said visual contents to form a video signal such that the song texts are 
displayed in a karaoke text display and said visual contents are displayed in a region 
outside the karaoke text display during some or all of the one or more songs. 

According to a further aspect of the present invention, there is provided apparatus for 
15 use in editing visual contents for display during Karaoke singing sessions, said 
apparatus comprising: 

means for retrieving a stored karaoke text elementary stream; 

means for determining an edit permission status within the retrieved karaoke text 
elementary stream; 

20 means for editing said visual contents if permitted by the edit permission status; 

and 

means for forwarding the edited visual contents for storage. 

With the present invention, visual content can be displayed anywhere, e.g. over the 
25 background video, away from the karaoke text, over the area occupied by the 
karaoke text, but not part of the text display (during non-singing periods) or over both 
(at different times or simultaneously). 

This invention provides additional advertising space as well as the opportunity to 
30 develop interactivity in Karaoke applications. It also creates an option for the content 
creator to insert relevant textual and interactive contents prior to distributing to the 
media distribution companies or broadcaster. The textual and interactive information 
is encoded as data for more effective transmission. This invention enables an efficient 
method of introducing relevant textual and interactive contents in Karaoke application 
35 to the user at minimal cost. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be further described by way of non-limitative example 
5 with reference to the accompanying drawings, in which:- 

Figure 1 is a schematic drawing of an embodiment of apparatus for multiplexing 
textual and interactive contents in a Karaoke application; 

10 Figure 2 illustrates a Menu Tree Application; and 

Figure 3 illustrates a screen which would appear to a karaoke user, when a signal is 
provided using the present invention. 

15 DESCRIPTION 

Karaoke content is made up of audio and text and scrolling information for the sing 
along display, as well as textual and interactive contents in the present invention. The 
Karaoke songs are encoded by using a relevant digital TV audio encoding standard 

20 such as MPEG Layer II or AC-3, and subsequently stored as audio elementary 
streams. The Karaoke texts and scrolling information, together with additional textual 
and interactive contents are encoded as Karaoke text elementary streams. Thus, for 
each Karaoke song, the content creator creates two files, one for an audio elementary 
stream and the other for a Karaoke text elementary stream. These files are stored in 

25 a database and can be distributed to other media distribution companies and 
broadcasters. 

The present invention is exemplified by way of encoding using MPEG-2, although is 
applicable to other formats. 

30 

In encoding for distribution as a transport stream, the Karaoke background content is 
produced and encoded into a video elementary stream. It can be coded at a lower bit 
rate, allowing more space for transmitting Karaoke audio songs. A single background 
video is used for every Karaoke songs to reduce the total transmission bandwidth. 
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The video stream is multiplexed with many Karaoke audio streams and associated 
private data streams that contain the Karaoke text elementary streams. The Karaoke 
data comes in the form of private section tables. The contents of the private section 
tables deliver Karaoke program guides, the Karaoke text and scrolling information 
5 associating with the audio, time reference information for synchronizing the scrolling 
text and audio, as well as the textual and interactive contents and short video clips. 
The media distribution provider may also edit or insert additional textual and 
interactive contents embedded in the Karaoke text elementary streams prior 
transmission. 

10 

Encoding of the textual and interactive contents is performed in two stages. In the first 
stage, the textual and interactive contents are embedded in the Karaoke text 
elementary stream and subsequently stored in the database. The Karaoke text 
elementary stream provides sufficient information for the Karaoke decoder in the 

15 receiver to display Karaoke text with scrolling colours to signify the singing tempo, as 
well as the intended textual display relevant to the Karaoke application. The 
embedded textual content may contain advertising messages or other relevant 
information. Interactive content such as textual display links or web-links for Internet- 
TV may also be inserted in this stage. The Karaoke text elementary stream and its 

20 associated audio elementary stream can be distributed to other media distribution 
companies. Since the stage 1 textual and interactive contents are embedded in 
Karaoke text elementary stream, it provides an option for the content creator to place 
relevant information intended for decoding before distributing to other media 
distribution companies. 

25 

Prior to broadcast, the media distribution or broadcaster may further edit or add to the 
textual and interactive contents. Thus, the stage 2 textual and interactive contents are 
further embedded to form the final Karaoke text elementary stream for decoding in 
the receiver. 

30 

During any non-singing intervals, the Karaoke text region can be designed or 
constructed to include user information, news flash or advertisements. Throughout 
the Karaoke application, textual displays can be inserted outside the Karaoke text 
region and interactivity can be added to provide links relevant to the Karaoke content. 
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Functional modules for multiplexing textual and interactive contents for use in 
Karaoke application constructed in accordance with an embodiment of the invention 
are shown in Figure 1. 

5 A Karaoke source 12 delivers an audio signal 14 (e.g. music) to an Audio Capture 
and Encoding module 16, where it is captured and recorded using MPEG Layer li 
(although other suitable audio compression standards such as AC-3 or the like can be 
used). One of the outputs of the Audio Capture and Encoding module 16 is the audio 
elementary stream 18, which is forwarded to and stored in a Karaoke Database 20. 

10 

A content creator (e.g. a person) uses a first user editing terminal 22 to create and 
edit Karaoke text and timing information 24, as well as textual and interactive contents 
26. 

15 This first user editing terminal 22 is generally used for songs and scoring scrolling 
information! A Textual and Interactive Encoding module 28 converts the user input 
textual and interactive contents 26 to suitable format textual and interactive data 29 
for further processing. The output Karaoke text and timing information 24, from the 
user editing terminal 22, and textual and interactive data 29, from the Textual and 

20 Interactive Encoding module 28, are sent to a Karaoke Text Elementary Stream 
Encoding module 30. These outputs are joined there by another output 31 of the 
Audio Capture and Encoding module 16. The Karaoke Text Elementary Stream 
Encoding module 30 is a tool for assisting the content creator. It integrates the input 
data streams 24, 28 to form the complete Karaoke text elementary stream 32. This 

25 too is forwarded to and stored in the Karaoke Database 20. 

The Karaoke text elementary stream 32 provides sufficient information for a Karaoke 
decoder in a receiver to display Karaoke text with scrolling colours to signify the 
singing tempo associated with the audio elementary stream 18, as well as to generate 

30 a textual display over the Karaoke text region during a non-singing period. It also 
contains information for generating additional textual display outside the Karaoke text 
region throughout the Karaoke application. The Karaoke text elementary streams 32 
and the associated audio elementary streams 18 may be distributed to other media 
distribution companies or broadcasters for transmission. Since the stage 1 textual and 

35 interactive contents are embedded in the Karaoke text elementary stream, it provides 
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an option for the content creator to place relevant content intended for decoding 
before distributing to other media distribution companies. However, prior to broadcast, 
the media distribution or broadcaster may further edit or add to the textual and 
interactive data. 

5 

A second user editing terminal 34 is used to edit textual and interactive contents 36, 
such as graphics and interactive data, including short video clips. The content format 
for textual and interactive data from the second user editing terminal 34 is the same 
as from the first user editing terminal 22. This second terminal 34 is normally located 
10 on-line and is able to add new content to an existing database (or to replace existing 
content), whereas the first terminal 22 is used to develop the Karaoke database and 
is normally located off-line. 

For example, Company A (content creation service provider) creates a content 
15 'Xaraoke_Text_Elementary_Stream" (including the associated audio elementary 
stream), and stores it in a database using the first user editing terminal 22. Such a 
stream may be freely distributed for placing advertisement textual information. 
Alternatively, it could be distributed to a broadcaster or service provider for a fee. 

20 When such stream are broadcast to the consumer, Company B (broadcaster/service 
provider) may not wish to use this content for a non-text region, that is a region 
outside where the Karaoke text is displayed and scrolled, but may want to add its own 
visual data at that point. The broadcaster or service provider may then edit the 
Karaoke Text Elementary Stream to generate textual content that is different from the 

25 original content, using the second user editing terminal 34. However, prior to adding 
or replacing the relevant content by way of changing a descriptor, the user of the 
second user editing terminal 34 needs to check the status of a distribution flag in the 
data. This flag determines whether Company B has the agreement of the content 
creation provider to make such changes. Company B cannot modify any existing 

30 TextuaLPresentation_Descriptor() that is "mandated". However, it can add a new 
TextuaLPresentation_Descriptor() to display a particular content in an unused time 
interval or display region. Since each Textual_Presentation_Descriptor has its own 
Distribution J=lag, Company B could mandate the display of this new content. 
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The user input textual and interactive contents 36 are converted by a Textual and 
Interactive Encoding module 38 to suitable data format for further processing. The 
encoded textual and interactive data 40 are then delivered to a Karaoke Application 
Encoder 42. 

5 

The Karaoke Application Encoder module 42 is a tool for assisting the broadcaster to 
generate the Karaoke application. As well as the encoded textual and interactive data 
40, the Karaoke Application Encoder 42 also receives extracted audio elementary and 
text elementary streams 44 from the Karaoke Database 20. The Karaoke Application 
10 Encoder module 42 integrates the encoded textual and interactive data 40 with the 
extracted Karaoke text elementary streams to generate Karaoke Textual and 
Interactive Description Tables 46. 

Through a connexion to the Karaoke Database 20, a broadcaster can also use the 
15 second editing terminal 34 to select a list of Karaoke songs to be broadcast. Control 
signals 48 extract the selected Karaoke text elementary stream and audio elementary 
stream files from the Karaoke Database 20, for processing within the Karaoke 
Application Encoder 42. Subsequently, the Karaoke Application Encoder 42 
generates respective program guide tables 50 listing the available Karaoke songs. 

20 

Finally, the Karaoke Application Encoder 42 also generates time reference tables 52 
(including time reference information for synchronising karaoke scrolling text and 
audio) based on the various inputs. The Karaoke application encoder 42 runs its own 
system time clock. This clock is used as a reference when encoding the "Time 

25 Reference Table" and the "Karaoke Textual and Interactive Description Table". The 
'Time Reference Table" uses the values in the system time clock directly. Any 
existing clock information in the "Karaoke Textual and Interactive Description Table" is 
processed to determine the rate of data delivery from this encoder. This clock 
information is recalculated to synchronize to the system time clock. Prior to delivery, 

30 the clock information is updated to be in line with the system time clock, thus enabling 
synchronization in a subsequent decoding process in a receiver. 

The Karaoke Textual and Interactive Description Tables 46, the guide tables 50 and 
the time reference tables 52 are all private data tables, exemplary formats of which 
35 are described later. Private data tables are used as they allow additional data to be 
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transmitted beyond that available in the elementary streams. Private data can also be 
used as a means for carrying updating software. For example, MPEG-2 specifies 
packets comprising a PES (packetised elementary stream) and section tables. Each 
section table is identified by a "table JD" value. Section tables between values 0x40 
5 and OxFE are considered as private section tables. Private data fits into the private 
section tables. 

The audio elementary streams 54 from the Karaoke Application Encoder 42 pass 
through to an Audio PES Encoding module 56, where they are encoded into audio 
10 packetized elementary streams (Audio PES) 58. There are multiple audio packetised 
elementary streams 58, each of which is associated with a Karaoke Textual and 
Interactive Description Table 46. The Audio PES Encoding module 56 uses the 
system time clock of the Karaoke Application Encoder 42 when encoding the Audio 
PES 58. 

15 

Thus, all the outputs from the Karaoke Application Encoder 42, together with the 
Audio PES 58 are in separate streams but are synchronised through clock signals. 

In parallel with the audio and text streams mentioned above, in a video section, a 
20 video source 60 supplies a Karaoke video signal 62 to a Video PES Encoding module 
64, where it is encoded into a video packetized elementary stream (video PES) 66. 
The video signal 62 forms the Karaoke video background that is used for all the 
Karaoke songs selection. 

25 The Karaoke Textual and Interactive Description Tables 46, the guide tables 50 and 
the time reference tables 52 from the Karaoke Application Encoder 42, the audio PES 
58 from the Audio PES Encoding module 56 and the video PES 66 from the Video 
PES Encoding module 64 are all input to a Multiplexing module 68. There they are all 
multiplexed into a transport stream (TS) 70. 

30 

MPEG-2 defines private section tables to carry user or private data. This invention 
uses the format of private section tables and further defines the semantics of such 
tables. As private section tables are standard, standard decoders can retrieve the 
private data. However, the semantics for such a decoder need to be developed to 
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support such implementation. The Guide Tables and the Time Reference Tables can 
be as defined in Singapore Patent No. 85646. 

In a decoder, users can view and select the available Karaoke songs that are being 
5 broadcast through the use of program guide tables. As the Karaoke sessions is in 
progress, the textual and interactive contents are decoded and displayed in the 
receiver. By using pre-assigned remote control buttons, the user may navigate 
through the interactive programs that are relevant to the Karaoke application. 

10 An example of the formatting of various information will now be described. In this 
embodiment, the Karaoke_Text_Elementary_Stream describes the Karaoke text and 
the scrolling information, as well as the textual and interactive contents. The 
synchronization timing information for the audio does not reside in the 
Karaoke_Text_Elementary_Stream. It is embedded in the Karaoke Textual and 

15 Interactive Description Table. 

The syntax of the Karaoke_Text_Elementary_Stream() is illustrated in Table 1. 

Table 1 

20 



Syntax 


No. of bits 


Karaoke_Text_Elementary_Stream() { 




ISO_639_l_anguage_Code 


24 


Reserved 


4 


Creation_lnformation_Data_Length 


12 


For (l=0;KN;l++){ 




Creation_lnformation_Data 

} 

Reserved 


8 or 16 


6 


Simultaneous_Scroll 


1 


Reserved 


1 


Karaoke_Textual_Data_Length 


16 


For (l=0;l<M;l++) { 




Reserved 


6 


Singingjndicator 


2 
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If (Singing Jndicator—0) { 




StartJDisplay_Time 


16 


!SO_639J_anguage_Code 


24 


Reserved 


2 


Row1_Text_Length 


6 


For (J=0;J<Row1 JText_Length; J++){ 




Text_Code 

} 

ISO_639_Language_Code 


8 or 16 


24 


Reserved 


2 


Row2_Text_Length 


6 


For (J=0;J<Row2_Text_Length;J++){ 




Text_Code 

} 

For (J=0; J<Row1_Text_Length+1 ; J++){ 


8 or 16 




Time_Code 

} 

For (J=0; J<Row2_Text_Length+1 ; J++){ 


16 




Time_Code 


16 


} 




} 




else { 




Descriptors_Loop_Length 


16 


For (J=0;J<N;J++){ 




DescriptorsO 




} 




} 




} 




Descriptors_LoopJ.ength 


16 


For (l=0;l<N;l++){ 




DescriptorsO 

} 

CRCJ32 




32 
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The semantic definitions are: 

■ ISO_639J_anguage_Code - This 24-bit field contains 3 character ISO 639 
language code of the following text fields. Each character is coded into 8 bits 

5 according to ISO 8859-1 . 

■ Creation JnformationJDataJ-ength - This 12-bit field specifies the length of 
bytes of the following Creation Information Data description. The content 
creator may place relevant information in CreationJnfomation_Data. 

■ Creation_lnformation_Data - The code defining the text character. The 
10 number of bytes to represent each text character is determined by 

ISO_639_Language_Code. 

■ Simultaneous_ScroII - A 1-bit field specifies scrolling on the two text display 
rows to be done independently or simultaneously. A '0' refers to 
independently. A '1' refers to scrolling simultaneously for dual language 

15 applications. 

■ Karaoke JTextualJData_Length - This 16-bit field specifies the length of bytes 
of the following Karaoke Textual Data description. 

■ Singingjndicator - See Table 2. Textual data for non-singing section has non- 
zero Singingjndicator value. 

20 « StartJDisplay_Time - This 16-bit field specifies the time for displaying the two 

text rows. Each count is 20msec. 

■ Row1_Text_Length - This 6-bit field specifies the number of text characters in 
the upper display row. 

■ Text_Code - The code defining the text character. The number of bytes to 
25 represent each text character is determined by ISO_639_Language_Code 

field. 

■ Row2_Text_Length - This 6-bit field specifies the number of text characters in 
the lower display row. 

■ Time_Code - This 16-bit field specifies the scrolling information of individual 
30 text character. Each count is 20msec. 

■ Descriptor_Loop_Length - This 16-bit field specifies the total length in bytes of 
the following descriptors. 

■ DescriptorsQ - See Table 3. Available descriptors are 
TextuaLPresentation_Descriptor and lnteractiveJ_inks_Descriptor. 
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- CRC_32 - This 32-bit field contains CRC value. It can be used to check the 
correctness of the data in this section. 

The definition of the Singingjndicator is illustrated in Table 2. Textual data for non- 
5 singing section has non-zero Singingjndicator value. The textual data is described in 
descriptors. 



Table 2 



Singingjndicator Value 


Descriptions 


0 


Singing section. 


1 


Non singing section at the start of the song. 


2 


Non singing section at the end of the song. 


3 


Non singing section at the middle of the song. 



10 

Table 3 lists the available descriptors in Karaoke Text Elementary Stream. The tag is 
an identification value for the descriptor. For each Karaoke song application, the 
Karaoke Text Elementary Stream may carry multiple Textual Presentation Descriptors 
15 but only one Interactive Links Descriptor may be present. 



Table 3 



Tag Value 


DescriptorO 


KaraokeJTextual 
Loop 


lnteractive_Textual 
Loop 


0xE1 


Textual_Presentation_ 
DescriptorO 


* 


* 


0xE2& 
0xE3 


lnteractiveJJnks_ 
DescriptorO 




* 



* Available 



20 

The syntax of the Textual_Presentation_Descriptor is illustrated in Table 4. The 
Textual_Presentation_Descriptor describes the textual content that shall be displayed 
by the decoder. It need not be restricted to text, as such. The word "textual" 
throughout this document can represent both text and graphics. 
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Table 4 



Syntax 


No. of bits 


Textual_Presentation_Descriptor() { 




Descriptor_Tag 


8 


Textual_Presentation_ID 


8 


Reserved 


2 


Distribution_Flag 


3 


Data_lnterpretation_Format 


3 


Textual_Data_Length 


16 


For (l=0;KTextual_Data_Length;l++){ 




Textual_Data 

} 

} 


8 



5 The semantic definitions of this descriptor are: 

■ DescriptorJTag - This 8-bit field provides an identification value indicating the 
Textual Presentation Descriptor. It shall have a value of 0xE1. 

■ Textual_PresentationJD - This 8-bit field provides a unique identification value 
10 for the following Textual Presentation Descriptor data. This value is used to 

provide links in interactive applications. It shall have a value between 0x10 to 
OxFF. A value of 0x00 is reserved for specifying the exit of the interactive 
application. A value of OxOF is reserved for specifying no action within the 
interactive application. A value of 0x01 is used to activate the Actions() task. 
15 Values between 0x02 to OxOE are reserved. 

■ DistributionJ=lag - See Table 5. 

■ Data_lntepretation_Format - See Table 6. 

■ TextualJDataJ-ength - This 16-bit field specifies the length of bytes of the 
following TextuaLData. 

20 ■ TextuaLData - The format of this 8-bit field data is defined by 

Data_Jntepretation_Format 

Table 5: Distribution_Flag Definition 
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Distribution_Flag Value 


Descriptions 


0 


Free distribution. Existing textual display and editing 
optional. 


1 


Free distribution. Existing textual display mandatory. No 
Editing 


2 


Free distribution. Existing textual display mandatory. 
Editing optional. 


3 


License distribution. Existing textual display and editing 
optional. 


4 


License distribution. Existing textual display mandatory. 
No Editing 


5 . 


License distribution. Existing textual display mandatory. 
Editing Optional. 


6-7 


Reserved 



When the flag indicates "Existing textual display mandatory. Editing Optional", it 
means that the existing display can be added to, but nothing can be removed. 

5 

Table 6: DataJntepretation_Format Definition 



Data_lntepretation_Format 
Value 


Descriptions 


0 


Reserved 


1 


Karaoke Textual Presentation Format - Basic Level 


2-7 


Reserved. 



In this embodiment, the Karaoke Textual Presentation Formats enable displaying of 
10 desired visual content over the Karaoke text region during non-singing intervals of a 
Karaoke song and/or over other parts of the display at certain parts of or even 
throughout a song, or even between songs too. 

The Basic Level Format is described in detail. It enables displaying of visual content 
15 in the form of textual content over the Karaoke text region during non-singing intervals 
of a Karaoke song. The complexity is kept minimal and the decoding requirements 
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are very similar to normal Karaoke text decoding. Additional functions define the 
intended display position and the time for display, the foreground colour, the 
background colour, and flashing display attributes. The Start_Display__Time and 
DisplayJTimeJnterval specify the intended time for display within the Karaoke 
5 application. It enables sequencing of textual display and can be used to deliver 
narrative description to the viewer. The Karaoke Textual Presentation Format - Basic 
Level need not be restricted to be use only over the Karaoke Text region. It can be 
used for other parts of the TV display at both singing and non-singing sections within 
the Karaoke application. The basic level is a simple text-based description language. 
10 Higher levels of the Karaoke Textual Presentation Format may be defined to have 
more complex features like graphics and other presentation engines. 

The syntax of the Karaoke Textual Presentation Format - Basic Level is illustrated in 
Table 7. 

15 

Table 7 



Syntax 


No. of bits 


Karaoke_TextuaLPresentation_Format_Basic_Level() { 




Presentation_Data_Length 


16 


For (l=0;KM;l++) { 




Start_Display_Time 


16 


DisplayJTimeJnterval 


16 


Presentation_Display_Clear 


1 


Reserved 


3 


Display_Data__Length 


12 


For (J=0;J<N;J++) { 




Reserved 


2 


Display_Location_X 


6 


Reserved 


2 


Display_Location_Y 


6 


Reserved 


2 


DispIay_Feature 


6 


ISO^eSQJ-anguage^Code 


24 


Reserved 


2 
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Text_Control_Data_Length 


6 


For (K=0;K<Text_Control_Data_Length;K++){ 




Text_Control_Code 


o or id 


} 




} 




} 




} 





The semantic definitions are: 



■ Presentation_Data_Length - This 16-bit field specifies the length of bytes of 
5 the following data description. 

■ Start_Display_Time - This 16-bit field specifies the time for displaying the text 
rows. Each count is 20msec. A value of OxFFFF indicates that the display is 
enable immediately. 

■ Display_TimeJntervai - This 16-bit field specifies the time interval for 
10 displaying the text rows. Each count is 20msec. A value of OxFFFF indicates 

that the display is enable till the next Presentation_Disp!ay_Clear value of '1' is 
encountered or the Karaoke song application end. 

■ Presentation_Display_Clear - This 1-bit field informs the decoder the clear all 
previous textual display. 

15 ■ Display_PataJ_ength - This 12-bit field specifies the length of bytes of the 

following data description. 

■ Display_Location_X - This 6-bit field specifies the X-axis coordinate for the 
start of the textual display. A value of 0x3F indicates default Karaoke text 
display region. 

20 ■ Display_Location_Y - This 6-bit field specifies the Y-axis coordinate for the 

start of the textual display. A value of 0x3F indicates default Karaoke text 
display region. 

■ Display_Feature - See Table 8. A 6-bit field specifying the preferred display 
style. 

25 ■ ISO_639J_anguage_Code - This 24-bit field contains 3 character ISO 639 

language code of the following text fields. Each character is coded into 8 bits 
according to ISO 8859-1. 
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■ Text_ControLData_Length - This 6-bit field specifies the number of text or 
control characters in the upper display row. 

■ Text_Control_Code - The code defining the text or control character. The 
number of bytes to represent each text character is determined by 
ISO_639_Language_Code field. The control codes for changing the display 
attributes are tabulated in Table 9. 



Table 8: Display Feature Definition 



Display_Feature Value 


Descriptions 


0 


None. 


1 


Scroll from Left. 


2 


Scroll from Right. 


3 


Insert from Left 


4 


Insert from Right 


5 


Insert from Bottom 


6 


Insert from Top 


7-63 


Reserved 


Table 9: Control Codes for Display Attributes 


Text_Control__Code Value 


Descriptions 


0x01 


Red 


0x02 


Green 


0x03 


Yellow 


0x04 


Blue 


0x05 


Magenta 


0x06 


Cyan 


0x07 


White 


0x08 


Black 


0x09 


Transparent 


0x10 


Change Background Color 


0x11 


Interchange foreground/background Color 


0x12 


Flash 
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The syntax of the Interactive Links Descriptor is illustrated in Table 10. The Interactive 
Links Descriptor describes the linkages between the various Textual Presentation 
Descriptors. 

Table 10 



oy II LCI A 


No. of bits 


\r\tc*rar % t\\/e± 1 inl^Q DfaQfrintor/^ / 
II iici ciouve__L.il ucoui iptui \j ^ 




r^oe/^rir^fM* "Ton 

uescnpior i dy 


8 


i-f /Hocrnntrvr Tan \ / 

it ^L/eocnpior__ i d_j — uaizo^ \ 




rxeserveQ 


3 


i\araoi\6 i exi_LJebLrf ipuuii^ _ i qijig__r 

\ 
J 

plop / 

CISC ^ 


13 






8 


For fl=0*l<Node Lood*I++W 




Node Name Lenath 


8 


Node Name 


var 


Current_Node _ID 


8 


Next_NodeJD 


8 


Previous_NodeJD 


8 


Ascend_Node_ID 


8 


Descend_Node_ID 


8 


if (Descend JModeJD==0x01) { 




ActionJLoop_Length 


16 


For (J=0; J<N;J++) { 




ActionsQ 




} 




} 




} 




} 




} 





The semantic definitions of this descriptor are: 
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■ DescriptorJTag - This 8-bit field provides an identification value indicating the 
Interactive Links Descriptor. It shall have a value of 0xE2 or 0xE3. 

■ Karaoke_Text_Description_PID - The PID value transporting the Karaoke Text 
Description Table that contains the Karaoke Text Elementary Stream. The 

5 decoder shall use the Textual Presentation Descriptors and Interactive Links 

Descriptor in this stream for generating the interactive application. 
- Nodej-oop - Number of node descriptions loops. 

■ Current JMode JD - Contain the value of the current TextuaLPresentationJD. 

■ Next_NodeJD - Contain the value of the next TextuaLPresentationJD. A 
10 value of 0x00 is used for exiting the interactive application. A value of OxOF 

indicates no action. 

■ Previous_Node_ID - Contain the value of the previous 
TextuaLPresentationJD. A value of 0x00 is used for exiting the interactive 
application. A value of OxOF indicates no action. 

15 ■ Ascend_Node_ID - Contain the value of the TextuaLPresentationJD for the 

immediate upper level of a menu tree structure. A value of 0x00 is used for 
exiting the interactive application. A value of OxOF indicates no action. 

■ Descend JMode J D - Contain the value of the TextuaLPresentationJD for the 
immediate lower level of a menu tree structure. A value of 0x00 is used for 

20 exiting the interactive application. A value of OxOF indicates no action. A value 

of 0x01 is used to activate the Actions() task. 

■ Action J_oop_Length - This 16-bit field specifies the total length in bytes of the 
following descriptors. 

■ Actions() - Descriptors describing the task upon user activation. 

25 

The Guide Table provides Karaoke programme guide information for the viewer to 
navigate among Karaoke programmes. The syntax of the private section table for 
Guide Table is illustrated in Table 11. 

Table 11 

30 



Syntax 


No. of bits 


Karaoke JDurrentj3uide_Table() { 
TableJD (user defined: 0x9f) 
Section_SyntaxJndicator 


8 
1 



WO 2004/032111 PCT/SG2003/000231 

22 



Private_Jndicator 


1 


Reserved 


o 


Private_Section_Length 


12 


Reserved 


2 


Version Number 


o 


Current_Next — Indicator 




Section Number 


p 
o 


Last Section Number 


p 
a 


Karaoke_Application_CodeJD 


16 


Current UTC Time 

V/UI 1 VI l» w 1 \S 1 III l\5 




Reserved 


4 1 




12 


For (I=0;l<N;l++){ 




Descriptor() 




Reserved 

1 Xwwwl V vU 


O 

O 


KTRT PID 


1 o 


i»ui i ilfci wi_ rxdi ouivc ii oy rain 


Q 

o 


For n=0'l<Mlimhpr of C\i irr^nt Kararkko Prnnram < l4«4.\/ 
i \j% \i v»>i^inui i lijci. vJi wuiiciK r\dl aUI\c rl vjy 1 ctlii, 




Start UTC Time 




Ston UTC Time 

V^fcV^M X-J 1 I lilt ^ 




KFDT Available 


A 

\ 


Reserved 


2 

Km 


KTDT_PID 


13 
i *j 


Reserved 


3 


Audio_PID 


13 


Reserved 


4 


lndex_Number_Length 


12 


For (J=0;J<lndexJs!umber_Length;J++){ 




lndex_Number 

} 

ISO_639_Language_Code 


8 


24 
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Title_Text_Length 


8 


For (J=0;J<Title_Text_Length;J++){ 




Tit1e_Text_Code 

} 

ISO 63Q 1 annuane norlp* 


8 


OA 


Sinaer Text Lenath 


ft 


For fJ=0*J<Sinaer Text Lennth - 




Sinaer Text Code 

I 
f 

Reserved 


ft 
o 


A 


Fft info lonnfh 
CO llllU^ toilful 




For (J=0;J<N; J++){ 




Descriptor(); 




} 




} 




CRC_32 


32 


} 





The semantic definitions of the fields in the private section table follow the generic 
syntax common in ISO/JEC 13818 [1]: 



5 ■ TableJD - This 8-bit field identifies the private table this section belongs to 
ISO/IEC 13818 [1] defines specific description from 0x00 to 0x3F. The Digital 
Video Broadcasting (DVB); Specification for Service Information (SI) in DVB 
systems (EN 300 468) uses values from 0x40 to 0x7F. User defined value starts 
from 0x80 to OxFE. Thus, the TableJD for this table can range from 0x80 to 
10 OxFE. 

■ Section_Syntax_lndicator - Set to '0' to indicate that the private defined data bytes 
immediately follow the Private_Section_Length. 

■ Private_lndicator — Not used 

■ Private_Section_Length - This 12-bit field specifies the number of remaining bytes 
15 in the section immediately following the Private_Section_Length field up to the 

end of the section. 
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■ Version_Number - A 5-bit field specifying the version number of the section table. 
The VersionjvJumber shall be increment by 1 when a change in the information 
carried in the section occurs. 

■ CurrentJMextJndicator - Set to T indicates that the section is the currently 
5 applicable section. 

■ Section_Number - An 8-bit field value gives the section number that is running 
sequentially. 

■ Last_Section_Number - An 8-bit field specifies the last section number. 

- CRCJ32 - This 32-bit field contains CRC value. It can be used to check the 
10 correctness of the data in this section. 

In this invention, the Karaoke program information is embedded in the Guide Table. 
The semantic definitions are: 

15 * Karaoke_Application_CodeJD - A 16-bit unique identification. Set to 0x5459. 

■ CurrentJJTCJTime - Local date and time in UTC format. 

■ Karaoke_StreamJn!fo_Length - This 12-bit field specifies the length of bytes of 
the following descriptorsQ. 

■ KTRT_PID - The PID value of the Karaoke Time Reference Table. 

20 ■ Number_of_Karaoke_Program - An 8-bit field specifies the number of Karaoke 
programs running currently. 

■ Start_UTC_Time - Starting Time of the Karaoke program. 

■ Stop_UTC_Time - Ending Time of the Karaoke program. 

■ KFDT_Available - A T to indicate presence of Karaoke Font Download Table 
25 associated with the following Karaoke Text Description Table. 

■ KTDT_PID - The PID value of the Karaoke Textual and Interactive Description 
Table. 

■ Audio_PID - The PID value of the Audio Packetized Elementary Stream. 

■ lndex_Number_JLength - Specifies the length of bytes of the following 
30 lndex_Number. 

■ Index JMumber - Index number of the Karaoke program. 

■ ISO_639_ianguage_code - This 24-bit field contains 3 character ISO 639 
language code of the following text fields. Each character is coded into 8 bits 
according to ISO 8859-1. 
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■ Title_Text_Length - This 8-bit field specifies the length of bytes of the following 
Title_Text_Code. 

■ Title_Text_Code - Text describing the Title of the song. 

■ ISO_639Janguage_code - This 24-bit field contains 3 character ISO 639 
5 language code of the following text fields. Each character is coded into 8 bits 

according to ISO 8859-1 . 

■ Singer_Text_Length - This 8-bit field specifies the length of bytes of the following 
Singer_Text_Code. 

■ Singer_Text_Code - Text describing the Singer of the song. 

10 ■ ESJnfo_Length - This 12-bit field specifies the length of bytes of the following 
descriptors(). 

To enable the receiver to display the Karaoke Text together with the associated song, 
the timing codes as well as Karaoke text and the scrolling information are carried in 
15 the Karaoke Textual and Intercative Description Table (KTDT) in the transport stream. 
The syntax of the private section tables for Karaoke Textual and Interactive 
Description Table is illustrated in Table 12 

Table12 

20 



Syntax 


No. of bits 


Karaoke_TextualJnteractive_Description_Table() { 




TableJD (user defined: 0x9d) 


8 


Section_SyntaxJndicator 


1 


Privatejndicator 


1 


Reserved 


2 


Private_Section_Length 


12 


Reserved 


2 


VersionJMumber 


5 


Current_NextJndicator 


1 


Section_Number 


8 


Last_Section_Number 


8 


Karaoke_Application_CodeJD 


16 


StartJDecodingJTime 


24 


Karaoke_Text_Elementary_Stream() 


var 
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CRC.32 

} 



32 



10 



15 



The semantic definitions are: 

■ Karaoke_Application_CodeJD - A 16-bit unique identification. Set to 0x5459. 

■ StartJDecodingJTime_Reference - A 24-bit timer reference for start of decoding 
the following KaraokeJText_ElementaryJData(). Use with 
Karaoke_Time_Reference carried in Karaoke Time Reference Table (KTRT). 
Each count is 20msec. 

■ Karaoke_TextJEIementary_Stream() - The elementary data containing the 
Karaoke textual and scrolling information. 

To synchronise the Karaoke Text display and the associated audio, the Karaoke Time 
Reference Table (KTRT) is introduced to set or update the Karaoke Text decoder 
timer with the source timing reference. The syntax of the private section table for 
Karaoke_Time_Reference_Table is illustrated in Table 13. 

Table 13 



Syntax 


No. of bits 


Karaoke_Time_ Reference_Table(){ 




TableJD (user defined: 0x9c) 


8 


Section_Sy ntax J ndicator 


1 


Privatejndicator 


1 


Reserved 


2 


Private_SectionJ-ength 


12 


Karaoke_Application_CodeJD 


16 


Karaoke_Time_Reference 


24 


CRC_32 

} 


32 



The semantic definitions are: 
20 ■ Karaoke_Application_CodeJD - A 16-bit unique identification. Set to 0x5459. 
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■ Karaoke_Time_Reference - A 24-bit timer reference for the decoder to sync with 
Karaoke encoding time reference in order to synchronise Karaoke audio and text 
decoding. Each count is 20msec. Use with Start_Decoding_Time in Karaoke Text 
Description Table. 

5 

Figure 2 illustrates an example of an interactive application: a menu tree application 
relevant to a Karaoke application of the present invention. Three buttons Next, 
Ascend and Descend (or their equivalent) are needed to navigate throughout the 
application. The application is started when a user activates the Descend button of 

10 the receiver. The textual display for the song title 2.1 is first displayed. The singer 
information 2.2, the recording company information 2.3 and message 2.4 can be 
navigated using the Next button. The application exits 2.5 on use of the Next button 
when the message display 2.4 is currently active. During the singer information 2.2 
textual display, the application can descend to the album information 2.6 textual 

15 display using the Descend button. The Next and Descend buttons can then be used 
to navigate through items such as the concert information 2.7 and the titles of the 
other tracks on the album 2.10, 2.11, 2.12. 

"Detail" can be any "Node_Name". For example, for Concert, the loop data containing 
20 "NodeJMame" = "Concert" shall be the selected node. Thus, the "Current_NodeJD" 
in this loop shall be used to look for Textual_Presentation_Descrlptor()" that matches 
this tt Current_Node_ID". Thus, the textual data whose "TextuaLPresentationJD" 
matches "CurrentJModeJD" shall be displayed. The semantics of Action() have not 
been defined in this document. 

25 

The navigation details within the application are described in the NextJModeJD, 
Ascend_NodeJD and DescendJModeJD in the Interactive Links Descriptor. 

This invention provides an effective means of adding further information such as 
30 news flash, advertisement or interactive content. In almost every Karaoke songs, 
there shall be some intervals where the singing is paused and only music is played for 
sometime before the singing is continue. The non-singing sections also appear quite 
substantially in starting and ending of Karaoke songs. During this non-singing interval, 
additional information may be displayed onto the Karaoke text region independent of 
35 the background video content. Textual and interactive contents for display outside the 
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Karaoke text region can also be inserted anywhere throughout the Karaoke song. 
This provides additional advertising space as well as the opportunity to develop 
interactive digital TV Karaoke. The textual and interactive contents that are inserted 
for display are encoded as data for more effective transmission compared to putting 
5 them onto the video. 

Figure 3 shows a karaoke display 80 in use. The Karaoke text appears in a display 
82 in a karaoke text region near the bottom of the picture. The background video 82 
takes up the majority of the display. An interactive display of visual contents 84 also 
10 appears in the main part of the display 80, superimposed on the background video. 

This invention introduces additional advertising space. It also creates an option for the 
content creator to insert relevant textual and interactive content prior to distributing to 
the media distribution companies or broadcasters. With such an option, the content 
15 rights holder may insert desirable advertising messages that will be displayed to the 
user regardless of the media distribution companies or broadcasters. As a result, the 
content creator may chose to distribute the Karaoke content free and the total cost for 
delivering the Karaoke content to the user can be reduced. Other scenarios are also 
possible, not necessarily including advertising or relating solely to advertising. 

20 

Whilst the present invention has been described with respect to a specific encoding 
approach and MPEG-2, other approaches and standards are also clearly applicable. 
Karaoke in the present invention covers not only songs, where there is a background 
video, music and a song text, but other similar applications, such as poetry readings 
25 or the like, or even where there is no music. 



